Overview

This document provides an overview of the FAA flight delay data. This document currently reviews the 2015, 2016, and 2017 Airline Service Quality Performance (ASQP) data; 2018 data is in process.

The following table summarizes the counts of flights by carrier across the three currently available years.

Counts of flights by year and carrier in ASQP data.
Carrier 2015 2016 2017
AA 725,984 914,495 896,348
AS 172,521 177,280 185,068
B6 267,048 282,473 298,654
DL 875,881 922,746 923,560
EV 571,977 490,990 339,541
F9 90,836 95,121 103,027
HA 76,272 76,789 80,172
MQ 294,632 NA NA
NK 117,379 138,199 156,818
OO 588,353 605,933 706,527
UA 515,723 545,067 584,481
US 198,715 NA NA
VX 61,903 69,121 70,981
WN 1,261,855 1,299,444 1,329,444
Total 5,819,079 5,617,658 5,674,621

Variables

The following tables summarize the variables being retained from the complete ASQP data. These tables allow for a quality assurance check on completeness and correctness of the data for each year.

2015

The data frame has 5,819,079 rows and 49 columns.

Overview of 2015 ASQP data. For factor variables, most frequent value is shown.
Variables Class N_unique Min_numeric Max_numeric Top_factor
YEAR numeric 1 2015 2015
QUARTER numeric 4 1 4
MONTH numeric 12 1 12
DAY_OF_MONTH numeric 31 1 31
DAY_OF_WEEK factor 7 4
FLIGHT_DATE Date 365
AIRLINE_ID factor 14 19393
CARRIER factor 14 WN
FLIGHT_NUM factor 6952 469
ORIGIN factor 322 ATL
ORIGIN_CITY_NAME factor 309 Chicago
DEST factor 322 ATL
DEST_CITY_NAME factor 309 Chicago
CRS_DEP_TIME_HR numeric 24 0 23
CRS_DEP_TIME_MIN numeric 60 0 59
DEP_TIME_HR numeric 26 0 24
DEP_TIME_MIN numeric 61 0 59
DEP_DELAY numeric 1218 -82 1988
DEP_DELAY_MINS numeric 1164 0 1988
DEP_DELAY_15 numeric 3 0 1
DEP_DELAY_GRPS numeric 16 -2 12
DEP_TIME_BLK character 19
TAXI_OUT numeric 185 1 225
WHEELS_OFF numeric 1441 1 2400
WHEELS_ON numeric 1441 1 2400
TAXI_IN numeric 186 1 248
CRS_ARR_TIME_HR numeric 25 0 24
CRS_ARR_TIME_MIN numeric 60 0 59
ARR_TIME_HR numeric 26 0 24
ARR_TIME_MIN numeric 61 0 59
ARR_DELAY numeric 1241 -87 1971
ARR_DELAY_MINS numeric 1158 0 1971
ARR_DELAY_15 numeric 3 0 1
ARR_DELAY_GRPS numeric 16 -2 12
ARR_TIME_BLK character 19
CANCELLED numeric 2 0 1
CANCELLATION_CODE factor 5 B
DIVERTED numeric 2 0 1
CRS_ELAPSED_TIME numeric 551 18 718
ACTUAL_ELAPSED_TIME numeric 713 14 766
AIR_TIME numeric 676 7 690
FLIGHTS numeric 1 1 1
DISTANCE numeric 1363 21 4983
DISTANCE_GRP numeric 11 1 11
CARRIER_DELAY numeric 1068 0 1971
WEATHER_DELAY numeric 633 0 1211
NAS_DELAY numeric 571 0 1134
SECURITY_DELAY numeric 155 0 573
LATE_AIRCRAFT_DELAY numeric 696 0 1331

2016

The data frame has 5,617,658 rows and 49 columns.

Overview of 2016 ASQP data. For factor variables, most frequent value is shown.
Variables Class N_unique Min_numeric Max_numeric Top_factor
YEAR numeric 1 2016 2016
QUARTER numeric 4 1 4
MONTH numeric 12 1 12
DAY_OF_MONTH numeric 31 1 31
DAY_OF_WEEK factor 7 5
FLIGHT_DATE Date 366
AIRLINE_ID factor 12 19393
CARRIER factor 12 WN
FLIGHT_NUM factor 6974 511
ORIGIN factor 313 ATL
ORIGIN_CITY_NAME factor 301 Atlanta
DEST factor 310 ATL
DEST_CITY_NAME factor 299 Atlanta
CRS_DEP_TIME_HR numeric 24 0 23
CRS_DEP_TIME_MIN numeric 60 0 59
DEP_TIME_HR numeric 26 0 24
DEP_TIME_MIN numeric 61 0 59
DEP_DELAY numeric 1339 -204 2149
DEP_DELAY_MINS numeric 1274 0 2149
DEP_DELAY_15 numeric 3 0 1
DEP_DELAY_GRPS numeric 16 -2 12
DEP_TIME_BLK character 19
TAXI_OUT numeric 179 1 186
WHEELS_OFF numeric 1441 1 2400
WHEELS_ON numeric 1441 1 2400
TAXI_IN numeric 194 1 250
CRS_ARR_TIME_HR numeric 25 0 24
CRS_ARR_TIME_MIN numeric 60 0 59
ARR_TIME_HR numeric 26 0 24
ARR_TIME_MIN numeric 61 0 59
ARR_DELAY numeric 1388 -152 2142
ARR_DELAY_MINS numeric 1284 0 2142
ARR_DELAY_15 numeric 3 0 1
ARR_DELAY_GRPS numeric 16 -2 12
ARR_TIME_BLK character 19
CANCELLED numeric 2 0 1
CANCELLATION_CODE factor 5 B
DIVERTED numeric 2 0 1
CRS_ELAPSED_TIME numeric 576 5 705
ACTUAL_ELAPSED_TIME numeric 717 14 778
AIR_TIME numeric 693 4 723
FLIGHTS numeric 1 1 1
DISTANCE numeric 1364 25 4983
DISTANCE_GRP numeric 11 1 11
CARRIER_DELAY numeric 1209 0 2142
WEATHER_DELAY numeric 690 0 1157
NAS_DELAY numeric 611 0 1446
SECURITY_DELAY numeric 174 0 474
LATE_AIRCRAFT_DELAY numeric 754 0 1484

2017

The data frame has 5,674,621 rows and 49 columns.

Overview of 2017 ASQP data. For factor variables, most frequent value is shown.
Variables Class N_unique Min_numeric Max_numeric Top_factor
YEAR numeric 1 2017 2017
QUARTER numeric 4 1 4
MONTH numeric 12 1 12
DAY_OF_MONTH numeric 31 1 31
DAY_OF_WEEK factor 7 5
FLIGHT_DATE Date 365
AIRLINE_ID factor 12 19393
CARRIER factor 12 WN
FLIGHT_NUM factor 7076 403
ORIGIN factor 320 ATL
ORIGIN_CITY_NAME factor 308 Atlanta
DEST factor 320 ATL
DEST_CITY_NAME factor 308 Atlanta
CRS_DEP_TIME_HR numeric 24 0 23
CRS_DEP_TIME_MIN numeric 60 0 59
DEP_TIME_HR numeric 26 0 24
DEP_TIME_MIN numeric 61 0 59
DEP_DELAY numeric 1496 -234 2755
DEP_DELAY_MINS numeric 1418 0 2755
DEP_DELAY_15 numeric 3 0 1
DEP_DELAY_GRPS numeric 16 -2 12
DEP_TIME_BLK character 19
TAXI_OUT numeric 181 0 183
WHEELS_OFF numeric 1441 1 2400
WHEELS_ON numeric 1441 1 2400
TAXI_IN numeric 257 0 414
CRS_ARR_TIME_HR numeric 24 0 23
CRS_ARR_TIME_MIN numeric 60 0 59
ARR_TIME_HR numeric 26 0 24
ARR_TIME_MIN numeric 61 0 59
ARR_DELAY numeric 1510 -238 2189
ARR_DELAY_MINS numeric 1406 0 2189
ARR_DELAY_15 numeric 3 0 1
ARR_DELAY_GRPS numeric 16 -2 12
ARR_TIME_BLK character 19
CANCELLED numeric 2 0 1
CANCELLATION_CODE factor 5 B
DIVERTED numeric 2 0 1
CRS_ELAPSED_TIME numeric 583 1 718
ACTUAL_ELAPSED_TIME numeric 721 15 784
AIR_TIME numeric 689 7 712
FLIGHTS numeric 1 1 1
DISTANCE numeric 1399 31 4983
DISTANCE_GRP numeric 11 1 11
CARRIER_DELAY numeric 1330 0 1934
WEATHER_DELAY numeric 813 0 1934
NAS_DELAY numeric 822 0 1605
SECURITY_DELAY numeric 192 0 827
LATE_AIRCRAFT_DELAY numeric 861 0 1756

Visualization

For each year, the following visuals provide a check in the delay data provided in ASQP. For this project, we will calculate delay based on wheels-off to wheels-on compared to the scheduled departure and arrival.

Plotting delays by day and airline

2015

2016

2017

Wheels off/on time compared

Confirming:

  WHEELS_OFF = DEP_TIME + TAXI_OUT
  WHEELS_ON  = ARR_TIME - TAXI_IN
Example time data for DEP, TAXI, WHEELS
DEP_TIME_HR DEP_TIME_MIN TAXI_OUT WHEELS_OFF WHEELS_ON TAXI_IN ARR_TIME_HR ARR_TIME_MIN
8 55 17 912 1230 7 12 37
8 50 15 905 1202 9 12 11
8 53 15 908 1138 13 11 51
8 53 14 907 1159 19 12 18
8 53 27 920 1158 24 12 22
8 56 85 1021 1256 4 13 0

Scheduled time:

  • CRS = Computerized Reservation System
Example WHEELS and CRS time
CRS_DEP_TIME_HR CRS_DEP_TIME_MIN WHEELS_OFF WHEELS_ON CRS_ARR_TIME_HR CRS_ARR_TIME_MIN
9 0 912 1230 12 30
9 0 905 1202 12 30
9 0 908 1138 12 30
9 0 907 1159 12 30
9 0 920 1158 12 30
9 0 1021 1256 12 35

Notes

Questions (for us to answer after reading the ASQP documentation):

  • NULL values for WHEELS_OFF or WHEELS_ON indicate no value in input data. What reasons might lead to missing values?
  • CRS time is the Computerized Reservation System. Should we take CRS time as the scheduled time that a consumer would see ‘at booking’?

Data processing notes:

  • WAC (World Area Code) is synonymous with state, we have omitted it to save space.
  • Have kept ORIGIN_CITY_NAME because assume weather events at one city will be shared across airports

Additional data:

  • Table of lat / long for each airport. Obtained from BTS here, will update data after integrating.
  • Time zone for departure and arrival airports. We will want this so that we can do our own calculations of time differences; we can generate this given lat longs of each airport if needed. Now that we have lat long, can overlay with timezone shapefile.

Merger of AA and US

Counts of flights by month for 2015, by carrier, show the effect of the merger between AA and US Airways in 2015 (AA/US merger; Airline codes here). Other years do not show any similar breaks in count of flights by month and carrier.

Count of flights by month of 2015, by carrier
1 2 3 4 5 6 7 8 9 10 11 12
AA 44,059 39,835 45,966 44,770 44,710 44,360 81,434 79,748 73,379 77,290 73,871 76,562
AS 13,257 12,194 14,276 13,974 14,682 15,075 15,821 16,095 14,271 14,467 13,950 14,459
B6 21,623 19,751 22,590 22,020 22,565 22,558 24,029 23,826 21,133 21,913 21,697 23,343
DL 64,421 60,884 74,166 72,170 74,815 77,255 80,741 80,947 72,063 75,552 72,228 70,639
EV 49,925 45,138 54,190 49,296 49,213 49,119 50,381 48,554 43,721 45,728 42,572 44,140
F9 6,829 5,809 6,950 7,148 8,118 7,893 8,090 8,142 7,873 8,101 7,763 8,120
HA 6,440 5,779 6,313 6,093 6,434 6,677 6,955 6,901 6,154 6,242 6,024 6,260
MQ 29,900 26,940 28,146 25,695 25,431 25,407 24,750 23,881 21,202 21,982 20,305 20,993
NK 8,743 8,089 9,400 9,496 10,051 9,826 10,351 10,432 9,948 10,208 10,164 10,671
OO 48,114 43,989 50,078 49,329 49,864 50,307 52,627 52,730 47,625 48,808 47,292 47,590
UA 38,395 36,235 43,603 41,342 44,411 46,084 46,478 45,413 41,778 45,894 42,647 43,443
US 33,489 30,153 34,516 32,496 33,761 34,300 0 0 0 0 0 0
VX 4,731 4,223 4,873 4,915 5,236 5,260 5,411 5,688 5,154 5,464 5,414 5,534
WN 100,042 90,172 109,245 106,407 107,702 109,776 113,650 108,179 100,645 104,516 104,045 107,476